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INTRODUCTION 

Given  two  random  samples  (X. ) and  (Y. ) , we  want  to 

JL  m 1 n 

test  the  hypothesis  that  **  ®’-®  different  possible 

alternatives.  Here  we  are  mostly  concerned  about  change  of  location: 

F^Cx)  “ Fjj(x  - 

In  Chapter  1,  we  review  the  classical  parametric  and  non-parametric 
procedures  that  are  currently  used.  In  Chapter  2,  we  Introduce  some  new 
test  statistics  obtained  from  Parzen's  new  formulation  of  the  problem 
(1977).  In  Chapter  3,  we  present  the  results  of  simulations  comparing 
these  different  procedures  on  a wide  range  of  underlying  distributions. 

In  Chapter  4,  we  document  the  use  of  a computer  package  developed  here, 
including  some  nev7  graphical  displays. 


V /I 
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CHAPTER  1 

CLASSICAL  PROCEDURES 

A.  PARAMETRIC  PROCEDURES 

If  we  assume  that  both  the  X-sample  and  the  Y-sample  are  normally 

distributed  with  respective  means  ^ and  ^2  common  unlcnown  vari- 

2 

ance  a , it  is  well  known  that  the  likelihood  ratio  test  for  testing 
H ! •*  versus  K ; 7-  is  given  by:  reject  H if 

is  large. 

"X-Y 

This  is  the  common  two-sided  t^7o-ccnplc  t-teot,  where  the  tent  statistic  has 
a t distribution  with  H - 2 degrees  of  freedom  (N  ■=  n + m)  . 

Even  when  the  original  samples  are  not  normal,  one  might  argue  by 
way  of  the  Central  Limit  Theorem  that  X and  Y are  normally  distributed 
and  still  use  the  same  test  statistic  with  approximate  distribution  the 
t distribution  with  H-2  degrees  of  freedom. 

The  procedure  has  also  been  extended  to  the  case  where  the  variances 
are  different  and  unknown.  One  still  gets  an  approximate  t-test  with  the 
degrees  of  freedom  being  approximated  by  Welch's  formula  (1S49). 

It  will  be  seen  in  Chapter  3 that  the  t-test  is  as  good  as  the 


approximations  involved 
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B,  NON-PARAlffiTRIC  PROCEDURES 

Ve  consider  only  ranlc  tests  generated  by  linear  rani:  statistics  of 
the  form 


N 

S 

1“1 


where  the  are  constants,  J(*)  Is  called  the  score  function,  and 

Rj  Is  the  rani:  of  in  the  combined  sample.  Chernoff  and  Savage  (1953) 

showed  how  to  choose  J(»)  in  the  situation  where  the  underlying  distri- 


butions are  the  same  up  to  a shift  in  location  (f_(x)  and  F_(x  - 6)  ) . 


Then  taking  J(u)  = - y ^ UlCUWi.UiX^«SS>  UliC  J.  4.V  U.L  kiiC 

fVF"  (u)} 

ranlc  test  relative  to  the  test  based  on  the  maximum  likelihood  estimator 


-f  MF 


maximizes  the  efficiency  of  the  linear 


e . 


1.  The  Wilcoxon  Test 

The  Vlllcoxon  rank  sum  statistic  corresponds  to  J(u)  >>20-1  ; that 
makes  it  the  most  efficient  of  the  rank  tests  when  the  underlying  distri- 
bution is  logistic.  Actually  we  use  a modified  version 


m 

W - Z R 
i»l 

This  statistic  is  asymptotically  normal  with  mean  ^ m(m  n -!*  1) 
and  variance  ^ mn(m  n -1-  1)  . 

2.  The  Van  der  Waerden  Test 

Corresponding  to  an  underlying  normal  distribution  the  score  function 
is  J(u)  ■ $ ^(u)  , where  la  the  quantile  function  of  a 


^ j 
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Normal  (0,1)  distribution.  ^ 

“ -1  ^ \ N 

This  statistic  is  again  asymptotically  normal  with  mean  0 and 

‘ 

3.  The  Median  Test 

The  median  test  statistic  corresponds  to 
-1  , u < % 

* 

1 , u > % 

which  is  ’'best"  for  a double-exponential  distribution.  The  median  test 

statistic  is  the  number  of  observations  in  the  X-sample  lying  above  the 

median  of  the  combined  sample.  It  is  also  asymptotically  normal  with 

mean  r and  variance  y,  — r (approximately).  We  use  the  correction 

t 4im  T n^ 

for  continuity  in  determining  approximate  critical  values. 


variance 


mn 


(m  +n)  (m  +n  - 1) 


m-m 

E 

l^l 
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CHAPTER  2 

NEW  TEST  STATISTICS 

A.  PARZEN'S  FORMULATION  OF  THE  T/O-SAMPLE  PROBLEM 

Parzen's  approach  consists  in  transforming  the  hypothesis 
Fjj(«)  = FyC*)  into  a hypothesis  of  the  form  D(u)  = u,  OSuSl. 

In  this  report  we  consider  only  the  following  transformation. 

Let  H(x)  =•  XFj^(x) -+  (1  - X)  F^(x)  . H(«)  is  the  distribution  of 

the  mixture  of  X and  Y , v/here  X is  the  proportion  coming  from  the 
X distribution.  Then, 

D(u)  = Fj^(h"^(u)) 

satisfies  D(u)  = u , when  “ Fy(*)  • 

To  test  for  D(u)  “u,  0£;u^l,is  equivalent  to  testing 

d(u)  - d'(u)  s 1 


1 

j 

1 


I 

i 

1 


or 

cr(v)  ■ f e^^*^^dD(t)  - 
‘0 

which  is  the  familiar  time  series  problem  of  testing  for  white  noise. 

We  proceed  as  follows: 

1.  Form  a raw  estimator  of  D(«)  : 


1 , V - 0 

0 , V ^ 0 


This  is  computed  in  the  following  v/ay 


D(u)  = Z w(j)/  Z V7(j) 


where  V7(j)  =1  if  the  j — observation  in  the  pooled  sample  is  an  X . 

This  procedure  can  be  easily  adapted  in  case  of  ties.  In  such  a 
case,  empirical  distribution  of  the  pooled  sample,  V7ill  have 

some  Jumps  of  size  greater  than  1/N  , but  the  definition  of  D(u)  is  th 
number  of  elements  in  the  X-sample  that  are  less  than  or  equal  to  the 
100  u % empirical  quantile  of  • Of  course,  the  definition  of  w(* 

is  changed  accordingly. 

In  case  of  censored  observations,  the  definition  reads 

D(u) 

whera  [tj]  are 
of  the  Jump. 

2.  Form 

JKv)  “ J e^^*^'^dD(t) 

'^0 

the  empirical  Fourier  transform  of  D(*)  . 


“ E w(t  ) 

tj 

the  points  of  increase  and  w(»)  represents  the  value 
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3,  There  are  several  possibilities  at  this  stage.  We  could  either  test 
for  cr!(v)  = 0 , V 0 , or,  using  the  autoregressive  method,  form  a 
smooth  estimator  of  the  derivative  of  D(»)  , d(»)  and  test  d(u)  = 1 . 

Each  of  these  alternatives  could  bo  done  in  several  ways,  explored  in 
Section  6. 

Another  possible  definition  for  D(*)  is  Dj^(u)  = F^^Fy^(u)^  to 

which  corresponds  0.(u)  = F„  (u)^  . In  v7ords,  D,  (u)  is  the 

number  of  elements  in  the  X-sample  that  are  less  than  or  equal  to  the 

100 u % empirical  quantile  of  F ( •)  , the  empirical  distribution  of  the 

I ,n 

y-sample.  The  other  steps  are  the  sane. 

B.  SOME  NE17  TEST  STATISTICS 

1,  Tests  based  on  the  Fourier  coefficients 
Under  the  null  hypothesis,  v;e  have  that 


tr(v)  = 0 , 


7^  0 . 


We  may  use  {p(*)  as  a test  statistic.  Under  the  null  hypothesis,  the  cp(v) 
are  asymptotically  independent  complex  Gaussian  with  mean  zero  and 
variance-covariance  matrix 

h 0 


0 


Then, 


2N  . |o(v)l^  ~ v2(2) 
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Another  possibility  is  to  use  statistics  of  the  form 

A(m)  = ^ r |c(v)|^ 
v=l 

Under  the  null  hypothesis, 

2N  • m • A(m)  ~ y^(2in) 

In  our  empirical  studies,  ve  discovered  that  A(l)  was  the  most 
sensitive  statistics  to  deviations  from  the  null  hypothesis. 

2.  Testing  for  d(u)  s 1 

Here  also  we  have  several  possibilities.  Estimating  d(*)  by  the 
autoregressive  method  transforms  the  null  hypothesis  into  choosing  order 
zero  as  the  best  order  for  the  autoregressive  estimator.  I7e  have  used 
Parzen's  CAT  criterion  and  Akail:e*s  criterion  to  decide  on  the  best 
order. 

1 2 

We  have  also  looked  at  some  functionals  of  d(»)  : P (log  d(u)  ) du 

1 0 

which  is  zero  under  the  null  hypothesis  and  f (d(u)  - 1)  log  d(u)  du 

■^0 

which  is  also  zero  under  the  null  hypothesis.  These  last  functionals 
were  not  tested  so  extensively  as  the  other  proposed  statistics,  and  so 
we  report  on  them  in  separate  tables. 

It  was  found  empirically  that  these  functionals  Increase  with  the 
order  of  the  autoregressive  estimator  used  in  computing  them.  For  this 
reason,  we  have  tried  t\7o  approaches:  one  V7ae  to  look  only  at  the  value 


10 


2 

obtained  from  the  first  order  (as  with  lcp(l)  ] ),  the  other  vjas  to  sub- 
tract a function  of  the  order  1:  (like  (2K  2)/m  ) and  find  out  vjhere 
the  maximum  or  minimum  occurred. 
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CHAPTER  3 

RESULTS  OF  THE  SE-IULATIONS 


A,  METHODOLOGY 

We  restricted  the  study  to  the  case  where  both  sample  sizes  are 
equal.  For  each  of  the  sample  sizes  10,  20  and  50  we  generated  100 
samples  at  different  values  for  the  shift  parameter  A ! 0.5  , 1.0  , 2.0  . 
The  null  case  was  done  with  200  samples. 

The  distributions  involved  were: 


N(0,1) 


1 

2 


(1+e*)^ 


TT(1 

-X 


e 


vs. 

vs, 

vs. 

vs. 

vs. 


N(A,1) 

1 

2 


x-A 


(l-^e’^■^)^ 


1 

TT^l  +(X-A)^) 
^-(x-A) 


(normal) 

(double-exponential) 

(logistic) 

(Cauchy) 

( exponent ial ) 


We  also  looked  at  contamination: 

N(0,1)  vs.  p • N(0,1)  -!•  (1-p)  N(0,9) 
for  p “ 0.1  , 0.2  and  0.5  . 

Note  that  contamination  does  not  fit  in  the  location  problem,  but 
we  were  determined  to  test  the  new  procedure  in  all  kinds  of  test  situations. 
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For  a given  sample  size,  the  same  seed  was  used  to  generate  the 
different  distributions.  The  procedure  was  as  follcnjs;  generate 
exponential  deviates;  obtain  from  them  ordered  uniform  deviates  and,  to 
get  a given  distribution  F(*)  , apply  the  inverse  transformation  F 

We  have  used  the  exact  critical  values  for  two-sided  tests  for  all 
the  classical  tests  except  for  sample  size  50  where  we  used  the  corres- 
ponding normal  approximation. 

For  the  CAT  criterion  and  the  new  test  statistics,  we  estimated 
the  appropriate  quantiles  from  200  replications  of  the  null  case. 

B.  PRESENTATION  OF  THE  EMPIRICAL  RESULTS 

The  tables  give  the  power  of  the  given  tests  at  level  a “ .05  , ’^e 

have  produced  tables  for  each  sample  size. 

Here  is  a list  of  the  abbreviations  used: 

Parzen's  criterion  to  choose  the  best  order  for  the  auto- 
regressive estimator  of  d(*) 

Wllcoxon  test 

Van  der  Waerden  test 

Median  test 

t-test 

test  based  on  the  first  empirical  Fourier  coefficient 

1 /V 

test  based  on  J log  d(u)  du  where  d(*)  is  the  auto- 
0 

regressive  estimator  of  d(»)  of  order  1 

1 /S  A,  . A 

test  based  on  J (d(u)  - 1)  log  d(u)  du  where  d(*)  is  as 
in  LG2  ° 

criterion  to  choose  the  best  order  using  DIV  . 


CAT  ; 

W : 

VDl^  : 

MED  : 

T-TEST  : 

|v(l)l^  : 

LG2  : 

\ 

DIV  ; 

Dive  : 


List  of  Tables 


Table  3.1: 
Table  3.2: 
Table  3.3 : 
Table  3.4: 
Table  3.5: 
Table  3.6: 


Critical  values  of  the  tests  used 

Power  for  sample  size  (10,10) 

Pon7er  for  sample  size  (20,20) 

Power  for  sample  size  (50,50) 

POT/er  of  tests  LG2,  DIV  and  DIVC  (20-20) 
Po\7er  of  tests  LG2,  DIV  and  DIVC  (50-50) 
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table  3.1 

CRITICAL  VALUES 


Sample  Size 


Test 

10 

20 

50 

CAT 

S -1.1,  for  orders  1,2 

£ -1.05,  orders  -»  3 

5 -1,02,  orders  -»  9 

W 

^ 78.75,  ife  131.25 

i 337,  2 483 

:S  2240,7,  2:  2309.3 

mi 

£ -3.88,  s 3.88 

s -5.75,  £ 5.75 

S -9.46,  S 9.46 

MED 

i 2.17,  7.83 

s 6.3,  a 13.7 

s 19.5,  30.5 

T-TEST 

s -2.101,  a 2.101 

s -2.021,  ^ 2.021 

£ -1.93,  s 1.98 

Ri)l^ 

i .13 

^ .056 

s .03 

LG2 

^ ,31  (order  1) 

^ .14  (order  1) 

^ .05  (order  1) 

DIV 

£ ,115  (order  1) 

a .06  (order  1) 

For  the 

DIVC  test: 

DIVOC)  - 2^  - J*  (S<“>  - X)  log  du  . 


then  find  U*  that  minimizes  DIVC')  *.  if  the  minimum  DlV(k*)  > 0 , the 
best  order  is  taken  to  be  zero.  For  the  sample  size  10  , consider  only 
orders  k - 1,2, 3, 4 
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lABLE  3.2 

TL  CORRECT  DECISIOIIS  SAMPLE  SIZE  10  - 10 


NULL  CASE 

CAT 

94 

W 

97 

VDt? 

96 

MED 

96.6 

T-TEST 

96 

1^(1)  1 

93.5 

NORMAL 

0.5 

11 

14.7 

14 

7 

15 

10 

1.0 

27 

54.5 

54 

32.5 

57 

25 

2.0 

84 

100 

100 

93.9 

100 

91 

CAUGHT 

0.5 

10 

3.7 

4 

4 

3 

14 

1.0 

23 

19.7 

15 

15.1 

9 

29 

2.0 

55 

44.2 

34 

52.1 

21 

69 

LOGISTIC 

0.5 

8 

4 

5 

3.5 

4 

9 

1.0 

16 

16.7 

19 

11.4 

21 

10 

2.0 

44 

69.5 

67 

51.3 

69 

42 

D.  EXP. 

0.5 

10 

10.2 

11 

8.7 

8 

18 

1.0 

29 

38.2 

36 

31.4 

37 

30 

2.0 

73 

95.2 

89 

84.9 

87 

84 

EXP 

0.5 

29 

LG2 

33 

36 

41 

23.6 

31 

33 

1.0 

65 

71 

77 

82 

61.6 

70 

71 

2.0 

96 

98 

100 

100 

97.5 

97 

100 

CONT 

0.2 

11 

3.75 

5 

3.7 

3 

14 

\ 


I 
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TABLE  3.3 

% CORRECT  DECISIOIB  SAIIPLE  SIZE  20  - 20 


cat 

U 

vni 

MED 

T-TEST 

15(1)1 

NULL  CASE 

95.6 

95.6 

94.3 

95.4 

93.6 

94.3 

NORMAL 

0.5 

17 

31 

31 

21 

28 

24 

1.0 

45 

88 

89 

67.9 

91 

61 

2.0 

99 

100 

100 

100 

100 

100 

CAUCHY 

0.5 

11 

10 

12 

12.5 

5 

24 

1.0 

37 

36 

27 

44.9 

10 

65 

2.0 

88 

83 

70 

88.3 

25 

95 

LOGISTIC 

0.5 

5 

20 

15 

7.9 

16 

15 

1.0 

26 

42 

42 

30.4 

40 

32 

2.0 

68 

94 

94 

84.1 

94 

86 

D.  EXP. 

0.5 

19 

23 

21 

22.2 

17 

30 

1.0 

52 

73 

67 

68.9 

59 

77 

2.0 

99 

100 

99 

98.6 

100 

100 

EXP. 

0.5 

55 

66 

70 

36.9 

40 

65 

1.0 

96 

99 

98 

91.2 

92 

97 

2.0 

100 

100 

100 

100 

99 

100 

CONT 

0.1 

13 

5 

6 

3.2 

8 

16 

0.2 

26 

6 

7 

3.2 

7 

26 

17 


TABLE  3.4 


I CORRECT  DECISIONS  SAMPLE 

SIZE  50-50 

CAT 

W 

VD[^ 

MED 

T-TEST 

1^(1) 

NULL  CASE 

93 

93.5 

93 

94.5 

93 

92 

N(»MAL 

0.5 

41 

71 

73 

63 

73 

38 

1.0 

92 

100 

100 

98.5 

100 

89 

2.0 

100 

100 

100 

100 

100 

100 

cAucm 

0.5 

33 

38 

32 

45 

5 

37 

1.0 

83 

75 

65 

82.5 

13 

87 

2.0 

100 

100 

99 

100 

25 

100 

LOGISTIC 

0.5 

19 

41 

38 

32.5 

38 

22 

1.0 

58 

80 

79 

74.5 

78 

61 

2.0 

98 

100 

100 

100 

100 

98 

D.  EXP. 

0.5 

55 

65 

59 

68 

52 

59 

1.0 

95 

99 

99 

98.5 

91 

97 

2.0 

100 

100 

100 

100 

100 

100 

EXP. 

0.5 

91 

96 

97 

69.5 

76 

81 

1.0 

100 

100 

100 

100 

100 

100 

2.0 

100 

100 

100 

100 

100 

100 

C(Mrr 

0.1 

73 

12 

20 

3 

6 

20 

0.2 

95 

22 

31 

4 

7 

43 

0.5 

93 

24.6 

32 

4.5 

5 

62 

aC 
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TABLE  3.5 

% CORRECT  DECISIONS  FOR  LG2,  DIV  AND  DIVC 

SAMPLE  SIZE  20-20 


LG2 

DIV 

DIVC 

NULL  CASE 

95 

95 

96 

NORMAL 

0.5 

19 

1.0 

53 

CAUGHT 

0.5 

18 

24 

24 

1.0 

65 

52 

2.0 

95 

EXP 

0.5  58 

1.0  96 

2.0  100 
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TABLE  3.6 

% CORRECT  DECISIONS  FOR  LG2,  DIV  AND  DIVC 


SAMPLE  SIZE  50-50 


LG2 

DIV 

DIVC 

NULL  CASE 

93.5 

93 

96 

NORMAL 

0.5 

47 

39 

1.0 

92 

91 

88 

2.0 

100 

100 

EXP 

0.5 

86 

1.0 

100 

1.0 

100 

20 


What  do  we  gather  from  these  tables?  We  note  first  that  the 
Wilcoxon  test  does  very  v/ell  in  all  the  comparisons  except  in  the  case 
of  contamination.  Also,  amongst  the  new  tests,  none  seem  to  do  as  well 
as  |cp(l)l^  and  |cp(l)l^  seems  to  be  the  best  for  very  heavy  tails  as 
In  the  Cauchy  distribution. 


Can  we  find  some  theoretical  reason  to  Justify  the  use  of  these 
tests? 

In  our  notation,  we  can  vnrite 


W " r u d D(u) 
0 


If  we  integrate  by  parts,  we  obtain 

W *•  1 - f D(u)  du 
0 

which  can  be  rewritten  as 


1 ~ 

W " ? + J (u-D(u))  du  . 


Note  that  j (u-D(u))  du  is  a very  intuitive  test  statistic  to  use  to 
0 

compare  D(u)  with  u when  the  alternatives  we  have  in  mind  are  of  the 
form  D(u)  *=-0  or  D(u)  > u as  they  are  in  the  location  problem. 

2 

On  the  other  hand,  |cp(l)|  can  be  expressed  as 


I -.2  1 --2 

IJ*  cos  2rr  udD(u)|  + Ij  sin  2n  udD(u)| 

0 0 


This  is  the  right  statistic  to  use  when  the  alternatives  are  of  the  form 


A, 
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d(u)  ” 1 + 0j^  cos  2r:  u + @2  sin  2rr  u 


and  we  are  testing  for  6^^  = ©2  = 0 . 

1 

Parzen  has  shown  in  his  report  (1977)  that  J sin  2n  udD(u)  is  the 

0 

best  test  statistic  for  the  location  problem  when  the  underlying  distribu- 

~ 2 

tlon  is  Cauchy.  So  it  is  not  surprising  that  lcp(l)|  does  very  well  in 
the  case  of  Cauchy  distribution.  Its  performance  in  the  other  cases 
depends  on  how  well  the  alternative  can  be  approximated  by  the  form 


d(u)  *»  1 + cos  2tt  u + 02  sin  2rr  u 


We  have  examined  so  far  the  small  sample  behavior  of  our  test  statis- 
tics. VThat  about  their  largo  sample  behavior  and  in  particular  what  about 

their  asymptotic  relative  efficiency?  Table  3.7  contains  the  relevant  infor- 

2 

matlon.  We  cannot  Include  lcp(l)l  as  its  asymptotic  distribution  is 
2 

y whereas  the  other  test  statistics  are  asymptotically  normal.  But 
we  can  see  from  Tables  2,  3 and  4 that  lcp(l)l  *s  performance  follows 
the  same  pattern  as  the  ARE  of  the  score  function  - sin  2rr  u : in 
decreasing  order  best  for  Cauchy,  then  double-exponential,  then  logistic 
and  finally  normal. 

In  Chapter  4,  we  present  some  new  graphical  displays  that  allo^7  one 
to  make  visual  teste  of  the  hypothesis  D(u)  ■ u . 


TABLE  3.7 


ASYICTOTIC  RELATIVE  EFFICIENCY 


Score  Function  Distribution 


Normal 

1.00 

D,-Exp 

Logistic 

sign  (2u  - 1) 

.64 

1.00 

2u-  1 

.95 

.75 

1.00 

-sin  2rr  u 

.43 

.81 

.61 
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CHAPTER  4 

COl-IPUTER  IiIPLEI«lENTATION 


A.  The  algorithm  that  we  propose  for  the  two-sample  problem  has  a 
simple  structure: 

1.  Compute  a set  of  weights  U(*)  . 

2.  Talce  their  Fourier  transform. 

3.  Fit  autoregressive  schemes  using  the  previous  Fourier 
transform  as  a covariance  function, 

4.  Plot  the  partial  sum  of  the  weights  against  the  uniform 
distribution  (the  45°  line)  and  the  smooth  autoregressive  distri- 
bution. 

5.  Compute  any  other  test  statistic  that  one  may  wish  to  compute. 
Step-by-Step  Analysis 

1.  The  weights  W(*)  represent  the  jumps  in  the  function  D(*) 
estimating  D(*)  ; remember  the  only  restriction  on  D(»)  is  that  under 
the  null  hypothesis,  D(u)  =u,  OSu^l. 

We  presently  have  3 subroutines  to  compute  the  weights,  all 
corresponding  to 


In  WC  , the  ties  between  the  samples  have  been  broken  by  randomization. 
In  WCF  , we  ignore  ties,  i.e.,  vje  let  F^^  and  have  jumps  of 

size  larger  than  1/m  or  1/N  . Finally,  in  KMPARZ  , and 


are  Kaplan-Meier  estimators  designed  to  handle  censored  observations.  In 
this  last  case,  we  also  record  the  number  of  Jumps  and  the  points  at  which 
the  jumps  occur  as  they  are  not  of  the  form  j/N  . 

See  the  listings  for  complete  documentation. 

2.  The  Fourier  transform  of  the  weights  is  computed  in  subroutines 
FORIERl  and  F0RIER2  . The  only  difference  is  that  in  FORIERl  the 
Jumps  occur  at  points  of  the  form  j/N  while  F0RIEI12  allows  for  general 
Jumps  as  in  conjunction  with  ITIPARZ  (censored  data) . 

3.  The  coefficients  of  the  different  order  autoregressive  schemes  are 
computed  iteratively  in  subroutine  AUTOREG  by  solving  Yule-Uallcer 
equations. 

4.  It  is  recommended  to  rnahe  several  plots  of  D(u)  , D(u)  =>  u and 
the  smooth  autoregressive  estimators,  one  for  each  of  a fev;  successive 
orders  of  the  estimator  (perhaps  up  to  5 ) and  study  hov;  they  vary.  Ease 
of  interpretation  will  come  with  e::perience,  but  one  can  see  whether  the 
shapes  change  and  whether  the  estimator  intersects  the  uniform  distribution 

B.  Case  Histories. 

The  first  two  sets  of  data  that  we  use  to  illustrate  our  packrge 
come  from  Maguire  et  al  (1952). 

1.  The  first  set  of  data  represents  lengths  of  time  between  serious 
accidents  in  the  mines  of  Great  Britain.  We  previously  found  sobie  evidence 
of  non-homogeneity  in  this  data  (Carmichael  (1976)).  This  time  we  split 
the  data  into  two  groups: 
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1st  Group:  the  first  60  values,  2nd  Group:  the  remalnins  49  values 

(the  data  is  in  chronological  order). 

We  present  the  following  exhibits: 

Table  4.1:  The  t\Jo  sets  of  data  and  some  descriptive  statistics. 

Table  4.2:  The  weights  as  computed  from  UC  . 

Pig.  4. 1-4. 5:  Superposition  of  the  autoregressive  estimator, 

D(»)  and  the  uniform  distribution. 

Table  4.3:  Classical  nonparanetric  tests. 


TABLE  4.3 

CLASSICAL  IIOITPAPJ^METRIC  TESTS 


Test 

Observed 

Critical  Values  (95%) 

Wllcoxon 

2910 

2978  - 3621 

Van  der  Waerden 

-11.75 

-9.85  - 9.85 

Median 

21 

21  - 35 

Savage 

47.12 

50.03  - 69.97 

.0246 

.0275 
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TiWLS  4.1  A 


SEQUENCE 

WITHIN 

QU^eillLE 

l:QUiia,9Uiai 

1 

1.0000 

50.0000 

108.0000 

275. OC 

2 

4.0000 

54.0000 

113.0000 

275. 0( 

3 

11.0000 

55.OUU0 

114.0000 

286.0^ 

13.0000 

56. 0000 

120. 0000 

312.0C 

5 

15.0000 

59.0000 

123.0000' 

315. OC 

6 

15.0000 

59.0000 

124.0000 

326.0f 

7 

17.0000 

61.0000 

137.0000 

345»0( 

8 

20.0000 

61.0000 

151.0000 

354. OC 

9 

22.0000 

72.0000 

176.0000 

361.0C 

10 

23.00C0 

78.0000 

180.0000 

378. OC 

11 

28.0000 

78.0000 

189.0000 

457. OC 

12 

31.0000 

81.0000 . 

203.0000 

4&V.0C 

13 

32.0000 

93.0000 

215.0000 

644. Of 

14 

36.0000 

96.0000 

217.0000 

671. OC 

15 

48.0000 

99.0000 

233.0000 

1205. OC 

sun 

316.0000 

1054.0000 

2411.0000 

6371. OC 

SUM  OF 
SQUARES 

8828.0000 

77788.0000 

414757.0000 

4105777. OC 

SAMPLE  SIZE  - 60 

MEDIAN  • 103.500 

MEAN  » 177.533 

INNER  fourths  MEAN  • 115.500 

OUTER  FOURTHS  MEAN  • 239.567 

VARIANCE  • 46034.999 

STANDARD  DEVIATION  • 214.558 

INTERQUARTILE  RANGE  - 227.000 

TRIMEAN  • 132.500 

6ASTUIRTH*S  ESTIMATE  - 115.500 

•OS-WINSORIZED  MEAN  > 147.933 

.05-TRIMHE0  MEAN  • 146.593 

•lO-HINSORIZeO  MEAN  - 141.017 

•lO-TRIMMEO  MEAN  • 136.896 

•25~WlNS0ftXZE0  MEAN  • 123.783 

•25~TRIMMED  MEAN  > 115.500 
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TABLE  4.1 B 


QiQ£5-ai&IlSU£S-ltl-iUiiI£i5l 

SEOUCnCE 

WITHIN 


fiU^EIILS 

Eiasi.9a&aiE& 

sfiCQiia.suiaiEa 

IUI2a>9UASI£& 

f.naaia,?<udaj 

1 

4.0000 

72.0000 

217. OULU 

338.00 

2 

7.C009 

75. C 300 

224.0000 

346.00 

3 

18.0000 

120.0000 

228.0000 

364.00 

4 

19.0003 

129.0000 

255,0000 

369.00 

5 

19.0000 

131.0000 

271.0000 

390. OC 

6 

20.0000 

145.00CC 

27>,oOOO 

498.00 

7 

29.0000 

155, 0030 

291,0033 

517. CO 

e 

37«0C00 

171,0000 

312.00  >3 

565,00 

9 

47.0000 

132.0000 

326.0C.,-.0 

745,90 

10 

49.0000 

195.0000 

329.0000 

1312cOO 

11 

54.0000 

206.0000 

330.0000 

135/, 00 

12 

66.0000 

217.0000 

336.0300 

16i3ftOC 

13 

1633.00 

SUM 

369.0000 

leci.coco 

3394, O: CO 

10047.00 

SUM  OF 

• 

SQUARES 

15603.0000 

295115.0000 

981678.0000 

10363241 cOO 

sample  size  • 49 

MEDIAN  • 217.000 

MEAN  • 318.592 

INNER  FOURTHS  HHAN  « 
OUTER  FOURTHS  MEAN  - 
VARIANCE  » 149731,247 
STANDARD  DEVIATION  • 
2KTER0UART1LE  RAN&E  • 
TRIHEAN  • 209. SOC 

SASTHIRTH'S  ESTIMATE 
.05-WINS0RIZED  MEAN  - 
.05-TRIMMeO  MEAN  • 
•lO-WINSORIZEO  MEAN  - 

.id-trimmed  mean  • 
•25-WINSORIZEO  MEAN  « 
#25-TRIMM£0  mean  ■ 


215.450 

416.040 

27  2 .LC,/ 

223.900 
260.245 
274. OCO 
243.735 
235.393 
t04.959 
22J.320 
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TABLE  4.2 


VC 

1) 

m 

1* 

VC 

2) 

■ 

c* 

VC 

3) 

• 

1. 

wc 

4) 

9 

0* 

wc 

5) 

C 

i. 

VC 

bi 

* 

1. 

VC 

7) 

m 

1. 

VC 

S) 

m 

!• 

wc 

9) 

9 

1. 

VC 

10) 

9 

0. 

VC 

11) 

• 

0. 

VC 

12) 

9 

0. 

wc 

13) 

9 

1. 

wc 

14) 

9 

0. 

KC 

15) 

9 

1. 

VC 

16) 

m 

1. 

VC 

17) 

m 

1. 

VC 

18) 

9 

0. 

wc 

19) 

9 

1. 

V( 

23) 

- 

1. 

VC 

21) 

m 

1. 

wc 

22) 

s 

c. 

VC 

23) 

9 

0. 

wc 

24) 

9 

1. 

VC 

25) 

■ 

0. 

VC 

Zb) 

m 

1. 

VC 

27) 

m 

0. 

wc 

26) 

9 

1. 

w( 

29) 

9 

1* 

W( 

33) 

9 

1. 

VC 

31) 

m 

1. 

VC 

32) 

s 

1. 

wc 

33) 

9 

1. 

wc 

34) 

9 

1. 

KC 

35J 

9 

Wc 

VC 

36) 

B 

D. 

W( 

37) 

m 

1. 

wc 

38) 

9 

0. 

wc 

39) 

a 

1. 

WC 

40) 

9 

1 • 

VC 

41) 

m 

1. 

VC 

42) 

m 

1. 

wc 

43) 

9 

1. 

VC 

44) 

9 

VC 

45) 

9 

1* 

VC 

46) 

m 

1. 

VC 

47) 

a 

1. 

wc 

48) 

a 

1. 

wc 

49) 

a 

0> 

uc 

50) 

m 

1. 

VC 

51  ) 

m 

1. 

VC 

52) 

• 

G. 

VC 

53) 

c 

0. 

wc 

54) 

9 

1* 

uc 

55) 

Cl 

0. 

VC 

56) 

M 

1. 

VC 

57) 

s 

0. 

wc 

56) 

9 

0* 

wc 

59) 

9 

1* 

wc 

60) 

9 

o; 

VC 

61) 

m 

1. 

wc 

62) 

• 

1. 

wc 

63) 

9 

0. 

wc 

64) 

9 

I* 

wc 

65) 

9 

0. 

VC 

66) 

m 

1. 

wc 

67) 

• 

0. 

VC 

68) 

m 

1. 

VC 

69) 

9 

0, 

wc 

70) 

9 

Oe 

VC 

71) 

m 

0. 

VC 

72) 

9 

!• 

wc 

73) 

9 

0* 

wc 

74) 

9 

0. 

wc 

75) 

9 

X t 

VC 

76) 

u 

0. 

wc 

77) 

9 

1. 

VC 

?a) 

9 

1. 

wC 

79) 

9 

0. 

wc 

83) 

» 

1* 

VC 

81) 

m 

0. 

VC 

32) 

9 

!• 

VC 

63) 

9 

!• 

wc 

64) 

9 

0. 

wc 

65) 

9 

0. 

V( 

66) 

m 

0. 

VC 

87) 

m 

0. 

VC 

68) 

9 

0. 

wc 

89) 

9 

1. 

V! 

90 

Oe 

VC 

91) 

m 

!• 

wc 

92) 

9 

1. 

wc 

93) 

9 

0* 

wc 

?4i 

9 

a. 

Ui 

95) 

9 

le 

VC 

96) 

m 

9 • 

VC 

97) 

9 

!• 

wc 

98) 

9 

1. 

wc 

99) 

9 

0. 

HClOO) 

m 

Ci 

UClOl) 

m 

0. 

WC102) 

9 

1. 

VC103) 

9 

0. 

WC104) 

9 

I. 

VC 

105) 

9 

Xc 

VC106) 

m 

0. 

WC1J7) 

» 

C. 

WClbS) 

9 

0. 

WC159) 

a 

0 o 

bC 
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Interpretations : 

We  note  from  Figures  4.1 -4.5  that  the  smooth  estimator  of  D(*)  is 
above  the  line  D(u)  = u almost  in  the  entire  range,  and  this  is  more 
pronounced  with  large  values  of  u • This  is  in  agreement  with  our  esti- 
mation of  the  density  of  the  combined  samples  referred  to  earlier.  All 
the  nonparametrlc  tests  would  reject  the  hypothesis  of  no  difference. 

This  is  in  sharp  contrast  with  the  results  obtained  by  Maguire  et  al, 
where  all  the  tests  they  performed  failed  to  detect  any  significant 
difference. 

The  new  test  based  on  1p(1) ] does  not  reject  the  null  hypothesis 
either.  What  v;e  need  to  give  an  anst7er  is  a combination  of  looking  at 
test  statistics  and  at  the  pictures. 

From  this,  we  vjould  conclude  that  the  two  groups  don’t  differ  for 
low  values,  but  that  the  second  group  has  experienced  the  longer  inter- 
arrival times  with  higher  probability. 

2.  The  second  set  of  data  is  of  the  same  type:  length  of  time  bet^jeen 
accidents  in  divisions  3 and  7 of  the  same  mine. 

The  exhibits: 

Table  4.4:  The  two  sets  of  data  and  some  descriptive  statistics. 

Table  4.5:  The  weights  as  computed  in  WC  . 

Fig.  4.6-4.10:  Pictures  obtained. 

Table  4.6;  The  weights  as  computed  in  WCF  . 

Fig,  4.11-4.15;  Pictures  obtained. 

Table  4.7:  Classical  nonparametrlc  tests. 
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TABLE  4.4  A 


SEQUENCE 


WITHIN 


&U&EIILE 

Eiaii.fiUAiiEa 

;i£aQtiQ.£UAaiEa 

luiaa.fiUAaiEa 

EQUaitl.CUAF 

1 

o.oooc 

3.0C00 

6.0000 

11. f 

2 

0.0000 

3.0000 

6.C000 

11. c 

3 

0,0000 

4.0000 

6.0000 

11.  c 

4 

l.OOCC 

4. COCO 

7.0U00 

12.C 

5 

l.OCOO 

5.0000 

7.0000 

13»0 

6 

1.0000 

5.0000 

8.0000 

13.C 

7 

1.0000 

5.0000 

8.0000 

13.0 

8 

l.OOOQ 

5.0000 

6.0000 

13.0 

9 

2.00CO 

5.0000 

9.0000 

17, C 

10 

2.00C0 

16.C 

SUM 

9.0000 

39.0000 

65.0000 

Z32.C 

SUM  OF 

• 

• 

SQUARES 

13.0000 

175.0000 

479.0000 

1796.0 

SAMPLE  SIZE  • 38 

MEDIAN  » 5,500 

MEAN  ■ 6.447 

INNER  FOURTHS  MEAN  ■ 
OUTER  FOURTHS  MEAN  « 
VARIANCE  • 23.876 

STANDARD  DEVIATION  ■ 
INTERQUARTILE  RANGE  » 

trimean  » e.ooo 

GASTWIRTH»S  ESTIMATE  * 
•05«WINSGRIZED  MEAN  > 
.OS-TRIMMED  MEAN  - 
•lO-HINSORIZED  MEAN  • 
•lO-TRINHEO  MEAN  » 
•25-MlNSORIZEO  MEAN  » 
.25-TRIMMcO  MEAN  « 


5.778 

7.C50 

4.886 

9.000 

5.8CO 

5.974 

6.306 

5.921 

6.15b 

5.816 

5.85C 
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TABLE  4.4  B 


DaQ£R-SIAIISIItI£-Itl-flUAai£aS 

SEQUENCE 

WITHIN 


EI&SI-ailfiBIEi 

S£{lQUa_9UAaiER 

miaQ.9Ui^dI£& 

EQUaid.QU&E 

1 

0.0000 

1.0000 

2.0000 

&•( 

2 

0.0000 

1.0000 

2.0000 

■ 6.( 

3 

0.0000 

1.0000 

3.00C0 

6.t 

4 

o.ooco 

1.0000 

3.0009 

7.< 

5 

0.0000 

1.0000 

3.0000 

7.« 

6 

c.oooo 

l.OQCG 

4.00CO 

Q,f 

7 

0.0000 

2.0000 

4.0000 

8 • • 

8 

o.ooco 

2.0000 

4.0000 

9.. 

9 
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2166. 

SAMPLE  SIZE  - 57 

MEDIAN  - 2.000 

MEAN  » 4.298 

INNER  FOURTHS  MEAN  » 2.73fc 

OUTER  FOURTHS  MEAN  • 5.759 

VARIANCE  • 24.963 

STANDARD  DEVIATION  ■ 4.996 

INTERQUARTILE  RANGE  - 5.000 

TRIMEAN  » 2.750 

6ASTWIRTH»S  ESTIMATE  » 2.600 

•O5-WINSORIZE0  MEAN  - 3.842 

•C5-TRIMME0  MEAN  ■ 3.849 

.10-WINS0RI2rD  MEAN  » 3.737 

•IC-TRIMMEO  MEAN  • 3.425 

•25-UINSORIZED  MEAN  * 3.070 

•25-TRIMMED  mean  - 2.897 
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• table  4.7 


Test 

Observed 

Critical  Values  (95%) 

Wilcoxon 

2203 

1566  - 20G2 

Van  der  VJaerden 

12.69 

-9.023  - 9.023 

Median 

29 

13  - 24 

Savage 

47.33 

28.87  - 47.13 

1?(1)1^ 

.0492 

.0316 

Interpretations : 

Even  though  the  raw  estimators  of  D(«)  differ  in  appearance  depend- 
ing on  whether  we  randomize  the  ties  (Fig.  4.8-4.10)  or  alio;;  for  jumps  of 
size  greater  than  1/N  (Fig.  4.11-4.15),  the  smooth  estimators  are  very 
comparable.  In  this  case,  they  are  below  the  line  D(u)  = u , meaning 
that  the  second  group  is  smaller  than  the  first  one. 


All  the  tests  agree 
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